February 10, 2021
Here we see the significant part of Data Preprocessing. It is Feature Selection. Why so this is?
It is because, failing in this part will lead to the entire failure,.. Now let us see what it is.
It is the process, where we automatically or manually select variables/features which contribute in predicting target variable or output. Basically selecting features which go as input of ML Model. Selection of irrelevant variables will retrench the accuracy of the Model.
Correlation Analysis
Chi Square Test
The purpose of this correlation analysis is to identify redundant variables(variables which are positively correlated or carrying the same information) before it is being fed into the Models. Feeding variables which are similar, will increase the complexity of the Models.
Take two variables and check the correlation. Catch sight of below for detailed calculation.
Now let us get into the practical execution in Excel to get some better understanding.
I am going to use excel formulas, which will be highlighted in the screenshots below.